Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies

نویسندگان

  • Christopher Ochs
  • James Geller
  • Yehoshua Perl
  • Yan Chen
  • Junchuan Xu
  • Hua Min
  • James T. Case
  • Zhi Wei
چکیده

OBJECTIVE Standards terminologies may be large and complex, making their quality assurance challenging. Some terminology quality assurance (TQA) methodologies are based on abstraction networks (AbNs), compact terminology summaries. We have tested AbNs and the performance of related TQA methodologies on small terminology hierarchies. However, some standards terminologies, for example, SNOMED, are composed of very large hierarchies. Scaling AbN TQA techniques to such hierarchies poses a significant challenge. We present a scalable subject-based approach for AbN TQA. METHODS An innovative technique is presented for scaling TQA by creating a new kind of subject-based AbN called a subtaxonomy for large hierarchies. New hypotheses about concentrations of erroneous concepts within the AbN are introduced to guide scalable TQA. RESULTS We test the TQA methodology for a subject-based subtaxonomy for the Bleeding subhierarchy in SNOMED's large Clinical finding hierarchy. To test the error concentration hypotheses, three domain experts reviewed a sample of 300 concepts. A consensus-based evaluation identified 87 erroneous concepts. The subtaxonomy-based TQA methodology was shown to uncover statistically significantly more erroneous concepts when compared to a control sample. DISCUSSION The scalability of TQA methodologies is a challenge for large standards systems like SNOMED. We demonstrated innovative subject-based TQA techniques by identifying groups of concepts with a higher likelihood of having errors within the subtaxonomy. Scalability is achieved by reviewing a large hierarchy by subject. CONCLUSIONS An innovative methodology for scaling the derivation of AbNs and a TQA methodology was shown to perform successfully for the largest hierarchy of SNOMED.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A tribal abstraction network for SNOMED CT target hierarchies without attribute relationships

OBJECTIVE Large and complex terminologies, such as Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT), are prone to errors and inconsistencies. Abstraction networks are compact summarizations of the content and structure of a terminology. Abstraction networks have been shown to support terminology quality assurance. In this paper, we introduce an abstraction network derivation met...

متن کامل

Identifying Missing Hierarchical Relations in SNOMED CT from Logical Definitions Based on the Lexical Features of Concept Names

Objectives. To identify missing hierarchical relations in SNOMED CT from logical definitions based on the lexical features of concept names. Methods. We first create logical definitions from the lexical features of concept names, which we represent in OWL EL. We infer hierarchical (subClassOf) relations among these concepts using the ELK reasoner. Finally, we compare the hierarchy obtained from...

متن کامل

FEDRR: Fast, Exhaustive Detection of Redundant Hierarchical Relations in Large Biomedical Ontologies

Redundant hierarchical relations refer to such patterns as two paths from one concept to another, one with length one (direct) and the other with length greater than one (indirect). This paper introduces a novel and scalable approach, called FEDRR – Fast, Exhaustive Detection of Redundant Relations – for quality assurance work during ontological evolution. FEDRR combines the algorithm ideas of ...

متن کامل

Using SPARQL to Test for Lattices: Application to Quality Assurance in Biomedical Ontologies

We present a scalable, SPARQL-based computational pipeline for testing the lattice-theoretic properties of partial orders represented as RDF triples. The use case for this work is quality assurance in biomedical ontologies, one desirable property of which is conformance to lattice structures. At the core of our pipeline is the algorithm called NuMi, for detecting the Number of Minimal upper bou...

متن کامل

Large-Scale, Exhaustive Lattice-Based Structural Auditing of SNOMED CT

One criterion for the well-formedness of ontologies is that their hierarchical structure forms a lattice. Formal Concept Analysis (FCA) has been used as a technique for assessing the quality of ontologies, but is not scalable to large ontologies such as SNOMED CT (> 300k concepts). We developed a methodology called Lattice-based Structural Auditing (LaSA), for auditing biomedical ontologies, im...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of the American Medical Informatics Association : JAMIA

دوره 22 3  شماره 

صفحات  -

تاریخ انتشار 2015